Detecting Bulk Document Leakage

نویسندگان

Balachander Krishnamurthy

Saurabh Kumar

Ashlesh Sharma

Lakshminarayanan Subramanian

چکیده

Recently enterprises and corporations have reported several incidents of leakage of sensitive data and plagiarism of copyrighted information on the Web. We address the Bulk Document Leakage (BDL) problem: Given a large set of sensitive or copyrighted documents in an enterprise, determine if a portion of the document set has leaked (or has been plagiarized) and been published on the Web. An adversary who wishes to evade detection may partially tamper the content before publishing it. We present an automated tamper-proof low complexity algorithm to solve the BDL problem. We extract embedded signatures from sensitive documents and use them in conjunction with search engines to determine whether near-duplicate versions of the document (or portions of it) are available on the Web. The embedded signature is tamper-proof; even if an adversary partially modifies a document, our mechanism can detect duplicate copies. Also, if a duplicate copy is present in the Web, our system can detect such a copy with a small number of queries to a search engine. We have tested the validity and tamper-proof aspect of our algorithms over a wide range of documents and corpora gathered from different large enterprises. Based on access logs of a large enterprise, we show real-world evidence of bulk leakage across several other domains.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Content-based data leakage detection using extended fingerprinting

Protecting sensitive information from unauthorized disclosure is a major concern of every organization. As an organization’s employees need to access such information in order to carry out their daily work, data leakage detection is both an essential and challenging task. Whether caused by malicious intent or an inadvertent mistake, data loss can result in significant damage to the organization...

متن کامل

CoBAn: A context based model for data leakage prevention

A new context-based model (CoBAn) for accidental and intentional data leakage prevention (DLP) is proposed. Existing methods attempt to prevent data leakage by either looking for specific keywords and phrases or by using various statistical methods. Keyword-based methods are not sufficiently accurate since they ignore the context of the keyword, while statistical methods ignore the content of t...

متن کامل

Simple Procedures for Detecting Network Attachment in IPv6

This document is subject to BCP 78 and the IETF Trust’s Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Impact of quantum mechanical tunneling on off-leakage current in double-gate MOSFET using a quantum drift-diffusion model

With the growing use of wireless electronic systems, off-state leakage current in MOSFETs appears as one of the major physical limitations. Measurements of quantum tunnel current between source-drain (S-D) have recently shown that it will become detrimental in bulk MOSFET architecture for channel lengths around 5nm and at low temperature (≤100K) [1]. In this paper we investigate, using a 2D qua...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Detecting Bulk Document Leakage

نویسندگان

چکیده

منابع مشابه

Content-based data leakage detection using extended fingerprinting

CoBAn: A context based model for data leakage prevention

Simple Procedures for Detecting Network Attachment in IPv6

Document Analysis And Classification Based On Passing Window

Impact of quantum mechanical tunneling on off-leakage current in double-gate MOSFET using a quantum drift-diffusion model

عنوان ژورنال:

اشتراک گذاری